Search CORE

117 research outputs found

c-REDUCE: Incorporating sequence conservation to detect motifs that correlate with expression

Author: A Stathopoulos
BC Foat
CT Harbison
D Cora
D Das
EM Conlon
Hao Li
HJ Bussemaker
JD Hughes
JD Thompson
Katerina Kechris
KD MacIsaac
KD MacIsaac
LD Ward
M Kellis
M Markstein
M Markstein
MW Gaunt
O Elemento
P Cliften
R Siddharthan
R Wu
S Keles
SJ Ho Sui
T Barrett
T Wang
W Zhong
WJ Kent
WW Wasserman
X Cai
X Li
X Liu
Y Kawahara
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Computational methods for characterizing novel transcription factor binding sites search for sequence patterns or "motifs" that appear repeatedly in genomic regions of interest. Correlation-based motif finding strategies are used to identify motifs that correlate with expression data and do not rely on promoter sequences from a pre-determined set of genes. Results In this work, we describe a method for predicting motifs that combines the correlation-based strategy with phylogenetic footprinting, where motifs are identified by evaluating orthologous sequence regions from multiple species. Our method, c-REDUCE, can account for variability at a motif position inferred from evolutionary information. c-REDUCE has been tested on ChIP-chip data for yeast transcription factors and on gene expression data in <it>Drosophila</it>. Conclusion Our results indicate that utilizing sequence conservation information in addition to correlation-based methods improves the identification of known motifs.</p

Crossref

Directory of Open Access Journals

PubMed Central

Genome-Wide Profiling of H3K56 Acetylation and Transcription Factor Binding Sites in Human Adipocytes

Author: A Marson
A Subramanian
Amy P. Baumann
B Schwer
C Das
Christopher J. Donahue
CM Conboy
D Dutta
DT Odom
ED Rosen
Ernest Fraenkel
F Liang
GW Swart
H Tilg
K Orford
KD MacIsaac
KD MacIsaac
Kinyui Alice Lo
L He
L Janderova
L Qiao
Lisa S. Hayes
Marc Tjwa
Mark A. Thiede
Mary K. Bauchmann
MI Lefterova
MS Hamza
NJ Butcher
PD Thomas
PW Caton
R Nielsen
RM Cowherd
SD Westerheide
Shelley Ann G. des Etages
SM Rangwala
T Yamauchi
TS Mikkelsen
W Huang da
W Huang da
W Xie
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2010
Field of study

The growing epidemic of obesity and metabolic diseases calls for a better understanding of adipocyte biology. The regulation of transcription in adipocytes is particularly important, as it is a target for several therapeutic approaches. Transcriptional outcomes are influenced by both histone modifications and transcription factor binding. Although the epigenetic states and binding sites of several important transcription factors have been profiled in the mouse 3T3-L1 cell line, such data are lacking in human adipocytes. In this study, we identified H3K56 acetylation sites in human adipocytes derived from mesenchymal stem cells. H3K56 is acetylated by CBP and p300, and deacetylated by SIRT1, all are proteins with important roles in diabetes and insulin signaling. We found that while almost half of the genome shows signs of H3K56 acetylation, the highest level of H3K56 acetylation is associated with transcription factors and proteins in the adipokine signaling and Type II Diabetes pathways. In order to discover the transcription factors that recruit acetyltransferases and deacetylases to sites of H3K56 acetylation, we analyzed DNA sequences near H3K56 acetylated regions and found that the E2F recognition sequence was enriched. Using chromatin immunoprecipitation followed by high-throughput sequencing, we confirmed that genes bound by E2F4, as well as those by HSF-1 and C/EBPα, have higher than expected levels of H3K56 acetylation, and that the transcription factor binding sites and acetylation sites are often adjacent but rarely overlap. We also discovered a significant difference between bound targets of C/EBPα in 3T3-L1 and human adipocytes, highlighting the need to construct species-specific epigenetic and transcription factor binding site maps. This is the first genome-wide profile of H3K56 acetylation, E2F4, C/EBPα and HSF-1 binding in human adipocytes, and will serve as an important resource for better understanding adipocyte transcriptional regulation.Singapore. Agency for Science, Technology and Research (National Science Scholarship )Massachusetts Institute of Technology (Eugene Bell Career Development Chair)National Science Foundation (U.S.) (Award No. DBI-0821391)Pfizer Inc

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

Cancer somatic mutations cluster in a subset of regulatory sites predicted from the ENCODE data

Author: A Mortazavi
A Pohl
A Visel
AP Boyle
C Melton
David R. Westhead
DK Goode
FW Huang
J Ernst
JA Wamstad
JH Friedman
JR Landry
KD MacIsaac
M. S. Vijayabaskar
MB Gerstein
MS Lawrence
N Weinhold
Nisar A. Shar
NJ Fredriksson
PA Futreal
RE Thurman
RS Hansen
S Djebali
SA Forbes
TH Rabbitts
WJ Kent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Transcriptional regulation of gene expression is essential for cellular differentiation and function, and defects in the process are associated with cancer. The ENCODE project has mapped potential regulatory sites across the complete genome in many cell types, and these regions have been shown to harbour many of the somatic mutations that occur in cancer cells, suggesting that their effects may drive cancer initiation and development. The ENCODE data suggests a very large number of regulatory sites, and methods are needed to identify those that are most relevant and to connect them to the genes that they control. Methods: Predictive models of gene expression were developed by integrating the ENCODE data for regulation, including transcription factor binding and DNase1 hypersensitivity, with RNA-seq data for gene expression. A penalized regression method was used to identify the most predictive potential regulatory sites for each transcript. Known cancer somatic mutations from the COSMIC database were mapped to potential regulatory sites, and we examined differences in the mapping frequencies associated with sites chosen in regulatory models and other (rejected) sites. The effects of potential confounders, for example replication timing, were considered. Results: Cancer somatic mutations preferentially occupy those regulatory regions chosen in our models as most predictive of gene expression. Conclusion: Our methods have identified a significantly reduced set of regulatory sites that are enriched in cancer somatic mutations and are more predictive of gene expression. This has significance for the mechanistic interpretation of cancer mutations, and the understanding of genetic regulation

Crossref

Springer - Publisher Connector

PubMed Central

White Rose Research Online

FigShare

Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site

Author: A Ambesi-Impiombato
A Blais
A Eto
A Subramanian
AE Kel
AG Clark
AL Lam
AM McGuire
Anat Reiner
Assif Yitzhaky
B Ren
C Kimura-Yoshida
C Plessy
C Yang
CT Harbison
D Pfeifer
D Wang
DB Allison
E Emberly
E Segal
Eytan Domany
FP Roth
GC Pipes
GC Yuan
GQ Yao
GZ Hertz
H Li
H Lodish
J Zheng
JD Hughes
JL DeRisi
JQ Ling
K Frech
K Quandt
KD MacIsaac
L Amir-Zilberstein
L Elnitski
L Marino-Ramirez
L McCue
M Ashburner
M Kellis
M Milyavsky
MA Nobrega
Mark Koudritsky
MC Frith
ML Howard
ML Whitfield
N Rajewsky
Or Zuk
P Carninci
P Carninci
P Cliften
PM Haverty
PR Buckland
R Elkon
R Liu
R Sharan
Ran Brosh
S Aerts
S Rashi-Elkeles
S Tavazoie
SJ Cooper
SJ Ho Sui
Sui Huang
U Gerland
Varda Rotter
WW Wasserman
X Xie
Y Barash
Y Benjamini
Y Benjamini
Y Tabach
Yossi Buganim
Yuval Tabach
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2007
Field of study

We introduce a novel method to screen the promoters of a set of genes with shared biological function, against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. The gene sets were obtained from the functional Gene Ontology (GO) classification; for each set and motif we optimized the sequence similarity score threshold, independently for every location window (measured with respect to the TSS), taking into account the location dependent nucleotide heterogeneity along the promoters of the target genes. We performed a high throughput analysis, searching the promoters (from 200bp downstream to 1000bp upstream the TSS), of more than 8000 human and 23,000 mouse genes, for 134 functional Gene Ontology classes and for 412 known DNA motifs. When combined with binding site and location conservation between human and mouse, the method identifies with high probability functional binding sites that regulate groups of biologically related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were put to several experimental tests. By allowing a "flexible" threshold and combining our functional class and location specific search method with conservation between human and mouse, we are able to identify reliably functional TF binding sites. This is an essential step towards constructing regulatory networks and elucidating the design principles that govern transcriptional regulation of expression. The promoter region proximal to the TSS appears to be of central importance for regulation of transcription in human and mouse, just as it is in bacteria and yeast.Comment: 31 pages, including Supplementary Information and figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The value of position-specific priors in motif discovery using MEME

Author: BC Foat
CT Harbison
DC Bauer
E Redhead
F Fang
FA Buske
GD Stormo
GZ Hertz
KD MacIsaac
L Narlikar
L Narlikar
MC Frith
Mikael Bodén
Philip Machanick
R Gordân
R Siddharthan
RC McLeay
S Sinha
Timothy L Bailey
TL Bailey
TL Bailey
TL Bailey
Tom Whitington
V Matys
WH Kruskal
WJ Kent
X Chen
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM). Results We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior. Conclusions We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

Author: A Ben-Dor
A Prelic
A Rosenwald
A Tanay
AD Basehoar
Akdes Serin
B Andreopoulos
BKH Chia
CT Harbison
D Burdick
DR Ciocca
G Li
GA Grothaus
J Lamb
JA Hartigan
JA Hartigan
JL Jensen
JN Keller
KD MacIsaac
Martin Vingron
R Shamir
RR Sokal
S Barkow
S Bergmann
S Hochreiter
SC Madeira
TM Murali
TR Hughes
XG Ni
Y Cheng
Y Hoshida
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However, traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However, it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that, there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. Results Here we present a fast biclustering algorithm called DeBi (Differentially Expressed BIclusters). The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset, on synthetic datasets and on human datasets. Conclusions We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

A Bayesian Partition Method for Detecting Pleiotropic and Epistatic eQTL Modules

Author: A Colman-Lerner
A Manichaikul
AC Cervino
AH Enyenihi
BM Bolstad
C Jiang
CJ Geyer
CM Kendziorski
CY Wu
D Mangin
EE Schadt
EE Schadt
EE Schadt
Eric E. Schadt
ES Lander
G Yvert
Gary D. Stormo
J Ronald
J Zhu
JD Storey
JS Liu
Jun S. Liu
Jun Zhu
KD MacIsaac
M Morley
N Yi
PJ Green
RB Brem
RB Brem
RB Brem
SI Lee
TR Hughes
V Emilsson
W Zou
Wei Zhang
Y Chen
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Studies of the relationship between DNA variation and gene expression variation, often referred to as “expression quantitative trait loci (eQTL) mapping”, have been conducted in many species and resulted in many significant findings. Because of the large number of genes and genetic markers in such analyses, it is extremely challenging to discover how a small number of eQTLs interact with each other to affect mRNA expression levels for a set of co-regulated genes. We present a Bayesian method to facilitate the task, in which co-expressed genes mapped to a common set of markers are treated as a module characterized by latent indicator variables. A Markov chain Monte Carlo algorithm is designed to search simultaneously for the module genes and their linked markers. We show by simulations that this method is more powerful for detecting true eQTLs and their target genes than traditional QTL mapping methods. We applied the procedure to a data set consisting of gene expression and genotypes for 112 segregants of S. cerevisiae. Our method identified modules containing genes mapped to previously reported eQTL hot spots, and dissected these large eQTL hot spots into several modules corresponding to possibly different biological functions or primary and secondary responses to regulatory perturbations. In addition, we identified nine modules associated with pairs of eQTLs, of which two have been previously reported. We demonstrated that one of the novel modules containing many daughter-cell expressed genes is regulated by AMN1 and BPH1. In conclusion, the Bayesian partition method which simultaneously considers all traits and all markers is more powerful for detecting both pleiotropic and epistatic effects based on both simulated and empirical data

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures

Author: A Dan
A Sandelin
AV Morozov
AV Morozov
BR Brooks
BS Xu
C Zhang
Chien-Yu Chen
Chih-Kang Lin
Chih-Wei Lin
Darby Tien-Hao Chang
DS Johnson
ED Siggia
EP Xing
GD Stormo
J Kirchmair
JE Donald
JJ Havranek
JW Ponder
KD MacIsaac
M Gao
M van Dijk
ML Bulyk
R Apweiler
RG Endres
RV Hogg
S Mahony
S Mahony
SF Altschul
TEr Cheatham
Ting-Ying Chien
V Matys
Vladimir N. Uversky
Y Zhang
Y Zhang
Yi-Zhong Weng
ZJ Liu
ZJ Liu
Publication venue: Public Library of Science
Publication date: 01/02/2012
Field of study

DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state). Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Predictive Model of the Oxygen and Heme Regulatory Network in Yeast

Author: A Kundaje
A Smith
A Tanay
A Tanay
AJ Hartemink
AJ Kastaniotis
AM Erkine
Anshul Kundaje
AP Gasch
AV Grishin
BM Bolstad
C Dagsgaard
CH Yeang
Changgui Lan
Christina Leslie
CT Harbison
CV Lowry
D Pe'er
E Segal
E Segal
E Segal
FM Ausubel
FP Roth
Herbert M. Sauro
HF Bunn
HJ Bussemaker
J Ernst
J Ihmels
J Olesen
JC Schneider
JD Hughes
JJ ter Linde
JY Choi
K Pfeifer
KA Morano
KD MacIsaac
KD MacIsaac
KE Kwast
KE Kwast
KV Shianna
L Guarente
L Zhang
L Zhang
L-C Lai
L-C Lai
Li Zhang
M Kaern
M Middendorf
M Middendorf
MA Beer
MD Piper
Mei Zhou
MJ Vasconcelles
MK Yeung
MR Grably
N Abramova
N Rachidi
NE Abramova
O Sertil
O Sertil
PV Burke
R Schapire
RA Irizarry
RE Schapire
RS Zitomer
RS Zitomer
S Kuge
S Labb‚
S Tavazoie
SL Tai
Steve Lianoglou
T Hon
T Hoppe
T Keng
T Prezant
TI Lee
TS Gardner
VV Svetlov
Xiantong Xin
Y Benjamini
Y Freund
Y Jiang
Y Jiang
Y Pilpel
Y Tu
Z Bar-Joseph
Publication venue: Public Library of Science
Publication date: 01/11/2008
Field of study

Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

Author: A Price
AD Smith
BJ Davids
CT Harbison
CT Workman
D La
E Segal
E Segal
Emma Redhead
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GZ Hertz
H Marks
HCM Leung
J Buhler
J Fang
J Zhu
JD Hughes
JJ Hu
KD Macisaac
M Akerman
M Brown
M Giufrè
M Tompa
MC Frith
MO Dayhoff
OG Berg
PA Pevzner
R Durbin
R Sharan
S Gupta
S Sinha
S Sinha
SR Krig
TD Schneider
Timothy L Bailey
TL Bailey
TL Bailey
TL Bailey
WH Press
WP Lehrach
X Liu
XS Liu
Y Barash
ZN Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. Results We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. Conclusion Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at <url>http://bioinformatics.org.au/deme/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central